Data Replication file for "Roads and Innovation"
by Ajay Agrawal, Alberto Galasso, and Alexander Oettl

===========
Input Files
===========

highways_innovation-msalevel.tab: Tab-delimited data to produce MSA-level results:  Tables 1-6 & 9, 10
highways_innovation-patentlevel.tab: Tab-delimited data to produce patent-level results:  Tables 7 & 8

highways_innovation.do: Stata .do file to replicate results in paper.  Compatible with Stata 13 (and later) on Windows/Mac/Linux.  The latest version of ivreg2 is required for replication (to allow for factor variables).  If errors occur, run:  ssc install ivreg2, replace 


==================
Dataset Dictionary
==================

**************************************
* File: highways_innovation-msalevel *
**************************************

File Sources Legend:
DT: Duranton and Turner (2012)
USPTO: United States Patent and Trademark Office

Notes: 	The dataset consists of 1,320 MSA-Class observations. (220 MSA, 6 classes).
	 	We only use issued patents but the dates correspond to their application year not issue year.

Variable Name					Source		Description
-------------					------		-----------

cites83         				USPTO		Number of citations in 1983
cites88         				USPTO		Number of citations in 1988              
class		           			USPTO		1-digit NBER Patent class (http://www.nber.org/patents/)
cooling_dd      				DT			Mean cooling degree-days
div1            				DT			census division 1
div2            				DT			census division 2
div3            				DT			census division 3
div4            				DT			census division 4
div5            				DT			census division 5
div6            				DT			census division 6
div7            				DT			census division 7
div8            				DT			census division 8
eleva_rug       				DT			ruggedness*elevation_range
elevat_range_msa				DT			Elevation range in MSA (m.)
elevat_range_msa2				DT			(max msa elev. - min msa elev)^2
heating_dd      				DT			Mean heating degree-days
highmsadensity	  				USPTO		MSA density (msanuminv83/arealandsqmi) is above the sample mean.                 
highvelocity    				USPTO		technology class is high velocity (Computers [2] and Electronics [4])
hwy1947         				DT			msa 1947 planned highways in km
l_hwy1947       				DT			ln(msa 1947 planned highways in km)
l_mean_income   				DT			ln(mean income)
l_pix_pre1850   				DT			ln(msa pixels of pre-1850 exp. route)
l_pop20         				DT			ln(1920 msa pop)
l_pop30         				DT			ln(1930 msa pop)
l_pop40         				DT			ln(1940 msa pop)
l_pop50         				DT			ln(1950 msa pop)
l_rail1898      				DT			ln(msa 1898 rail km)
l_rd_km_IH_83   				DT			ln(msa 1983 interstate highways in km)
labslarge83     				USPTO		number of large labs in msa-class
labssmall83     				USPTO 		number of small labs in msa-class
ln_cites83      				USPTO		ln(total citations received to patents applied in 1983 + 1)
ln_cites88      				USPTO		ln(total citations received to patents applied in 1988 + 1)
ln_largecites83 				USPTO		ln(total citations received to patents applied in 1983 by large firms + 1)
ln_largecites88 				USPTO		ln(total citations received to patents applied in 1988 by large firms + 1)
ln_msacites83   				USPTO		ln(msacites83 + 1)
ln_msacites88   				USPTO		ln(msacites88 + 1)
ln_msainv73     				USPTO		ln(number of distinct inventors at the msa-level 1973 + 1)
ln_msanuminv78  				USPTO		ln(number of distinct inventors at the msa-level 1978 + 1)
ln_msanuminv83  				USPTO		ln(number of distinct inventors at the msa-level 1983 + 1)
ln_msapats83	    			USPTO		ln(total issued patents at the msa-level applied for in 1983 + 1)
ln_msapats88    				USPTO		ln(total issued patents at the msa-level applied for in 1988 + 1)
ln_nonmove_dist_83				USPTO		ln(distance between patents in 1983 and the patents they cite in the same msa; nonmovers)
ln_nonmove_dist_88				USPTO		ln(distance between patents in 1988 and the patents they cite in the same msa; nonmovers)
ln_nonmove_samemsa_new_pats_83	USPTO		ln(new assignee patents in 1983 that cite a patent in same msa + 1; nonmovers)
ln_nonmove_samemsa_new_pats_88	USPTO		ln(new assignee patents in 1988 that cite a patent in same msa + 1; nonmovers)
ln_nonmove_samemsa_pats_83		USPTO		ln(patents in 1983 that cite a patent in same msa + 1; nonmovers)
ln_nonmove_samemsa_pats_88		USPTO		ln(patents in 1988 that cite a patent in same msa + 1; nonmovers)
ln_numinv73        				USPTO		ln(number of distinct inventors in 1973 + 1)
ln_numinv78     				USPTO		ln(number of distinct inventors in 1978 + 1)
ln_numinv83						USPTO		ln(number of distinct inventors in 1983 + 1)
ln_pats83						USPTO		ln(number of patents in 1983 + 1)                 
ln_pats88       				USPTO		ln(number of patents in 1988 + 1)                 
ln_samemsa_dist_83				USPTO		ln(distance between patents in 1983 and the patents they cite in the same msa)
ln_samemsa_dist_88				USPTO		ln(distance between patents in 1988 and the patents they cite in the same msa)              
ln_samemsa_new_pats_83			USPTO		ln(new assignee patents in 1983 that cite a patent in same msa + 1)
ln_samemsa_new_pats_88			USPTO		ln(new assignee patents in 1988 that cite a patent in same msa + 1)
ln_samemsa_pats_83				USPTO		ln(patents in 1983 that cite a patent in same msa + 1)
ln_samemsa_pats_88				USPTO		ln(patents in 1988 that cite a patent in same msa + 1)
ln_smallcites83 				USPTO		ln(total citations received to patents applied in 1983 by small firms + 1)
ln_smallcites88 				USPTO		ln(total citations received to patents applied in 1988 by small firms + 1)
logco           				DT	 		ln(cooling_dd + 1)                 
loghe           				DT	 		ln(heating_dd + 1)                 
logpc           				DT			ln(pc_aquifer_msa + 1)
msa             				USPTO		Metropolitan Statistical Areas defined in 1993 by US Office of Management and Budget
msacites83      				USPTO		total citations at the msa-level received to patents applied in 1983
msacites88      				USPTO		total citations at the msa-level received to patents applied in 1988
msanuminv83     				USPTO		number of distinct inventors at the msa-level 1983
msapats83       				USPTO		total issued patents at the msa-level applied for in 1983
msapats88       				USPTO		total issued patents at the msa-level applied for in 1988
msasample       				USPTO		dummy set to 1 for the first observation in each msa to run msa-specific regressions
no83inventors   				USPTO		dummy set to 1 if there are no inventors the msa-class cell in 1983
numinv83        				USPTO		number of distinct inventors in 1983
pats83          				USPTO		number of patents in 1983
pats88          				USPTO		number of patents in 1988              
pc_aquifer_msa  				DT          % of MSA overlaying aquifers
pc_aquifer_msa2 				DT			(% msa overlying consolidated aquifer)^2
pix_pre1850 					DT			msa pixels of pre-1850 exp. route
rail1898        				DT			kilometers of railroad routes in 1898 contained in each MSA
rd_km_ih_83						DT			msa 1983 interstate highways in km
ruggedness_msa					DT			Terrain ruggedness index in MSA
ruggedness_msa2					DT			(Terrain ruggedness index in MSA)^2
s_poor_80       				DT	 	 	msa share poor in 1980
s_somecollege_80				DT			msa share with some college in 1980
seg1980_ghetto  				DT			1980 segregation index
smanuf77        				DT			Share msa employment in manufacturing in 1977
spatial_sighway47				USPTO		Distance weighted average of 1947 planned highways in other MSAs
spatial_sighway83				USPTO		Distance weighted average of 1983 interstate highways in other MSAs
spatial_sail    				USPTO		Distance weighted average of 1898 rail kms in other MSAs
spatial_soutes  				USPTO		Distance weighted average of pre-1850 exp. route in other MSAs
star_83         				USPTO		dummy set to 1 if a star inventor exists in the msa-class in 1983


*****************************************
* File: highways_innovation-patentlevel *
*****************************************

File Sources Legend:
DT: Duranton and Turner (2012)
USPTO: United States Patent and Trademark Office

Notes: 	The dataset consists of 10,776 within-msa citations from 1988 patents and 10,776 control citations for a total sample size of 21,552.
	 	We only use issued patents but the dates correspond to their application year not issue year.


Variable Name					Source		Description
-------------					------		-----------

year							USPTO		year of cited patent
citedsubcat						USPTO		2-digit NBER Patent Class of cited patent
subcat							USPTO		2-digit NBER Patent Class of citing patent
cited							USPTO		set to 1 if the cited patent is a true citation and 0 if it is a control citation
ln_dist							USPTO		ln(distance between cited and citing patent)
ln_dist_other					USPTO		ln(mean distance between cited and citing patents in same technology fields but other MSAs)



